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METHOD AND APPARATUS FOR PROCESSING 
INTERAURAL TIME DELAY IN 3D DIGITAL AUDIO 

This application claims priority from U.S. Patent Application 
5 No. 60/065,855 entitled "Multipurpose Digital Signal Processing System" 
filed November 14, 1997, the specification of which is explicitly 
incorporated herein by reference. 

BACKGROUND OF THE INVENTION 
10 1. Field of the Invention 

This invention relates generally to three dimensional (3D) 
sound. More particularly, it relates to a digital implementation of interaural 
time delays used in 3D digital sound applications. 

15 2. Background of Related Art 



three-dimensional (3D) sound, allowing a more realistic experience when 
listening to sound. In some applications, 3D sound allows a listener to 
perceive motion of an object from the sound played back on a 3D audio 



technology as early as 1962, as described in U.S. Patent No. 3,236,949, 
which is explicitly incorporated herein by reference. The Atal-Schroeder 
3D sound cross-talk canceler was an analog implementation using 
25 specialized analog amplifiers and analog filters. To gain better sound 
positioning performance using two loudspeakers, Atal and Schroeder 
included empirically determined frequency dependent filters. Without 
doubt, these sophisticated analog devices are not applicable for use with 
today's digital audio technology. 



Many high-end consumer devices provide the option for 



20 system. 



Atal and Schroeder established cross-talk canceler 
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Interaural time difference (ITD), i.e., the difference in time 
that it takes for a sound wave to reach both ears, is an important and 
dominant parameter used in 3D sound design. The interaural time 
difference is responsible for introducing binaural disparities in 3D audio or 
5 acoustical displays. In particular, when a sound object moves in a 
horizontal plane, a continuous interaural time delay occurs between the 
instant that the sound object impinges upon one of the ears and the 
instant that the same sound object impinges upon the other ear. This ITD 
is used to create aural images of sound moving in any desired direction 

1 0 with respect to the listener. 

The ears of a listener can be 'tricked' into believing sound is 
emanating from a phantom location with respect to the listener by 
appropriately delaying the sound wave with respect to at least one ear. 
This typically requires appropriate cancellation of the original sound wave 

15 with respect to the other ear, and appropriate cancellation of the 
synthesized sound wave to the first ear. 

Atal-Schroeder implemented the delays and cancellations 
with appropriate analog filters and analog amplifiers, as shown herein in 
Figs. 5 and 6. Figs. 5 and 6 herein are described in detail in the Atal- 

20 Schroeder U.S. Pat. No. 3,236,949 with reference therein to Figs. 2 and 4, 
respectively. Fig. 5 herein shows the conventional 3D sound system for 
creating the image of sound from a phantom locality with respect to the 
listener, while Fig. 6 herein shows the analog delay line with multiple tap 
points implemented by Atal-Schroeder. 

25 Thus, the interaural time delay is manipulated to synthesize 

localities of the source of particular sounds, and to create the sense of 
motion of particular sounds. 

Conventional 3D sound systems embed the interaural time 
difference in empirically determined head-related transfer functions 

30 (HRTFs), typically determined with a mannequin head implanted with 
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microphones in its ears. The available delays typically have a relatively 
large resolution, e.g., 100 microseconds, formed by null filter taps, as 
disclosed by Atal-Schroeder. 

However, there are at least two basic problems with the 
5 implementation of the conventional analog approach in a digital 
environment. First of all, the large resolution in the available time delays 
cause discretely sampled interaural time differences for the expected 
position of a listener. Thus, a 'closest' or 'best fit' ITD must be chosen, 
which may be up to 50% away from the ideal parameter. This may cause 

10 a jittering effect in the sense of movement of the sound by the listener. 
Moreover, implementation of a digital filter emulating the analog filter 
having multiple taps as shown herein in Fig. 6 is computationally involved, 
providing a level of system inefficiency from a computational view. 

One conventionally proposed implementation of a digital 3D 

15 sound system to provide a more accurate ITD based on the given 
resolution has been to interpolate the entire HRTF set such that the ITD 
becomes interpolated as well. Unfortunately, interpolation itself can 
become a computationally intense requirement which likely adds to, rather 
than cures, the computational inefficiency otherwise associated with 

20 digital 3D sound systems. 

There is thus a need for an efficient and simplified method 
and apparatus for providing digital 3D sound. 

SUMMARY OF THE INVENTION 

25 In accordance with the principles of the present invention, a 

digital delay line for use in a 3D audio sound system comprises a first 
delay module providing a choice of any delay within a first resolution. A 
second delay module is in series with the first delay module. The second 
delay module provides a choice of any of a plurality of additional fractional 
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delays. Each of the additional fractional delays is less than the first 
resolution. 

A method for providing an interaural time delay in a digital 
3D sound system in accordance with another aspect of the present 
5 invention comprises selecting one of a plurality of available first time 
delays having a first resolution between each of the plurality of available 
first time delays. Additionally, one of a plurality of available second time 
delays is selected. Each of the plurality of available second time delays is 
less than the first resolution. The selected first time delay is added to the 
10 second time delay to provide a desired interaural time delay. 



BRIEF DESCRIPTION OF THE DRAWINGS 

2 5 

Ji Features and advantages of the present invention will 

■ j3 become apparent to those skilled in the art from the following description 

Jr? 15 with reference to the drawings, in which: 

O Fig. 1 is a block diagram showing the digital 3D sound 

s system including a digital interaural delay line, in accordance with the 

H principles of the present invention. 

f* Fig. 2 is a more detailed diagram showing the digital 3D 

J3 20 sound system for creating 3D sound in a digital environment, in 
w accordance with the principles of the present invention. 

Fig. 3 is a diagram showing the implementation of multiple 
digital audio streams using a common bank of fractional delay filters, in 
accordance with the principles of the present invention. 
25 Fig. 4 shows a process for creating an improved ITD look-up 

table suitable for use in an ITD look up table for use with 3D sound 
applications as shown in Figs. 1 and 2, in accordance with the principles 
of the present invention. 

Fig. 5 shows a conventional 3D sound system for creating 
30 the image of sound from a phantom locality with respect to the listener. 
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Fig. 6 shows a conventional analog delay line with multiple 
tap points implemented by Atal-Schroeder. 

DETAILED DESCRIPTION OF ILLUSTRATIVE EMBODIMENTS 

5 In accordance with the principles of the present invention, 

the ITD is extracted from measured and empirically determined HRTFs, 
smoothed, and implemented in a look-up table. Implementation of the ITD 
is provided by a delay line including both an integer portion providing 
rough estimate delays and a fractional portion providing a very accurate 

10 delay and eliminating discontinuities in the listening field to provide a more 
relaxed listening sweet spot. 

The present invention provides a digital filter bank having a 
simple and inherently low cost architecture for performing a stable cross- 
talk cancellation, providing excellent localization and external ization of 

15 virtual sound images. 

In accordance with the principles of the present invention, 
head-related transfer functions corresponding to the speaker positions are 
recorded and used to construct the filter coefficient. The relationship 
between the speaker position and filter design were studied to provide a 

20 more relaxed listening "sweet spot" where the 3D sound effects are 
optimized. Thus, the listener does not have to sit in a very accurately 
placed position with respect to the loudspeakers to appreciate the 3D 
aspects of the audio rendered by only two loudspeakers. 

Fig. 1 is a block diagram showing the basic components of 

25 the disclosed embodiment of a digital 3D sound system including a digital 
interaural time delay line, in accordance with the principles of the present 
invention. 

In particular, a sound source 220 is input into a digital 
interaural time delay line 254. the interaural delay line 254 includes an 
30 integer delay module 250 providing a rough estimate of the desired 
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interaural time delay, and a fractional delay module 252 providing a highly 
refined additional time delay. In the disclosed embodiment, both the 
particular settings of both the integer delay module 250 and the fractional 
delay module 252 are chosen from among a plurality of predetermined 
5 delays, greatly reducing or eliminating the otherwise intensive calculations 
necessary to interpolate a particular interaural time delay. 

The particular delay associated with the left (or right) ear 
signal 260 and the right (or left) ear signal 262 providing the desired 
localization of the sound image is provided by a localization control 
10 module 270. 

Fig. 2 is a more detailed diagram showing the digital 3D 
sound system shown in Fig. 1. 

In particular, the integer delay module 250 of the disclosed 
embodiment is comprised of a first-in, first-out (FIFO) buffer 204. The 

15 FIFO buffer 204 may be of any suitable width, e.g., 16 bits, corresponding 
to the length of the digital audio samples. Moreover, the length of the 
FIFO buffer 204 will be based on the largest delay necessary to 
implement the desired 3D sound imaging. The particular delay is related 
to the selected number of clock cycles after the particular digital audio 

20 sample was input to the FIFO buffer 204. This selection of an integer 
delay time is represented in Fig. 2 with a multiplex switch 206. The use of 
any of the particular digital audio samples 224a-224d are fed serially into 
the FIFO buffer 204, with the arrows from each of the samples 224a-224d 
representing tap numbers. 

25 The clock cycle of the FIFO buffer 204 relates to one over 

the sample rate. Thus, with an exemplary sample rate of 22 kiloHertz, the 
Integer' portion, or resolution of the integer delay module 250 is 1/22,000 
or approximately 45 microseconds (uS). 

The second portion of the digital interaural delay line 254 

30 provides a much more refined 'fractional' delay with a fractional delay 
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module 252. This fractional delay is provided by the selection of any one 
of a plurality of fractional delay filters 208-212. 

The fractional delay module 252 effectively produces an 
adjustable digital delay with a finer resolution than the integer delay 
5 module 250. Each of the fractional delay filters 208-212 is a so-called all- 
pass filter that has a variable phase shift, corresponding to the required 
fractional delay. The number of phases (i.e., fractional delay filters 208- 
212) is determined empirically by behavioral testing of human listening. 

In the disclosed embodiment, 64 fractional delay filters are 

10 utilized, each providing an incrementally greater delay, in finely resolved 
increments suitable to the application. For instance, at the exemplary 
sample rate of 22 kiloHertz, the resolution between the fractional delay 
filters 208-212 is (45 uS)/64, or about 0.7 uS resolution. This particular 
fine resolution (and the rough estimate resolution provided by the integer 

15 delay module 250) can be adjusted based on the needs of the particular 
application. 

Each fractional delay filter 208-212 is a finite impulse 
response (FIR) filter, i.e., a polyphase filter, effecting the desired delay. 
Each of the fractional delay filters 208-212, and/or the fractional delay 
20 controlled switch 216 and/or the multiplexer 214 can be implemented in 
any suitable processor, e.g., in a digital signal processor (DSP), 
microprocessor, or microcontroller. Alternatively, the digital filters can be 
implemented in hardware in accordance with the principles of the present 
invention. 

25 In the exemplary embodiment utilizing a sampling rate of 22 

kiloHertz, the first fractional delay filter 208 provides 0.7 uS delay to a 
digital audio sample passing therethrough, the second fractional delay 
filter 210 provides approximately 1.4 uS delay, etc., until the last fractional 
delay filter 212 which provides approximately 44.3 uS delay. 
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Selection of the appropriate fractional delay filter 208-212 is 
implemented by a multiplexer 214 in the fractional delay module 252. In 
the shown embodiment, the fractional delay filters 208-212 are each 
implemented in a processor, e.g., in a digital signal processor, and 
5 selection of an appropriate one of the fractional delay filters 208-212 is 
desirable at the front end to avoid wasted computational power by running 
fractional delay filters 208-212 which are not being used for that particular 
audio sample. 

The interaural time delay is controlled by the localization 

10 control module 270, which includes a 3D audio application source position 
controller 222, an interaural time delay (ITD) look-up table 220, and an 
integral and fractional delay selector 218. In the disclosed embodiment, 
the localization control module 270 is implemented in a suitable 
processor, e.g., in a microprocessor, microcontroller, or digital signal 

15 processor (DSP). Of course, the localization control module 270 may 
alternatively be partially or wholly implemented in hardware, e.g., using 
programmable array logic. 

The 3D audio application source position control 222 selects 
a desired 'phantom' position of the sound sample currently being input to 

20 the digital interaural delay line 254. The desired location may have a 
desired x, y and z coordinate with respect to a reference point, e.g., the 
center of the listener's head. Based on the desired location, an 
associated ITD is determined in the ITD look-up table 220. The integral 
and fractional delay selector determines the largest integer value which 

25 can be achieved within the resolution of the integer delay module 250 
without exceeding the desired ITD, and appropriately controls the integer 
delay module 250 to provide that desired delay to the audio sample. 
Similarly, the remainder or fractional portion of the desired ITD which is 
not provided by the integer delay module 250 is provided by an 
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appropriate selection of a desired one of the available fractional delay 
filters 208-212 in the fractional delay module 252. 

Fig. 3 is a diagram showing the implementation of multiple 
digital audio streams using a common bank of fractional delay filters, in 
5 accordance with the principles of the present invention. Thus, the 
plurality of fractional delay filters 208-212 can be utilized by a plurality of 
audio sources for the same listener, avoiding the need to duplicate the 
fractional delay module 252 for each audio source. 

Fig. 4 shows a process for creating the ITD look-up table 
10 220 shown in Fig. 2. 

In particular, in step 102, binaural impulse responses are 
empirically measured with a sound source at various locations around the 
listening environment, e.g., at incremental points along a sphere about the 
sound source. 

15 In step 104, the ITD information is extracted from the 

empirically measured information obtained in step 102, and a 'mesh' of 
ITD values for each appropriate point on the sphere is determined. In 
particular, the ITD samples may be extracted from measured left-right ear 
head-related transfer functions (HRTFs) using cross-correlation. These 

20 samples can be viewed as discrete samples of an underline continuous 
ITD function of azimuth and elevation coordinates. 

In step 106, to avoid the 'jittering' and other undesirable 
effects for the listener, the ITD mesh determined in step 104 is smoothed 
using any appropriate smoothing algorithm. For instance, the ITD 

25 samples may be regularized using a "generalized spline model" or 
appropriately filtered and interpolated by a two-dimensional filter to gain 
smoothness and continuity. While this smoothing may be calculation 
intensive, it is performed once, off-line, and not performed in real-time as 
digital audio samples are received. 
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In step 108, the smoothed ITD mesh is input into the ITD 
look-up table 220. The ITD mesh may utilize any appropriate coordinate 
system, e.g., spherical coordinates or a standard x, y and z coordinate 
system. 

5 In the disclosed embodiment it was determined that the 

finest time resolution of the overall delay, i.e., the combination of the delay 
provided by the integer delay module 250 and the fractional delay module 
252, is preferably less than 1 microsecond (|aS) such that any 
discontinuity caused in the sound stream is under the hearing threshold of 
10 a typical human. In the case of a high sampling rate, faster time 
resolution may be preferred. For example, with a 22.05 kiloHertz 
_ sampling rate of an audio stream, a 64-phase polyphase filter was used to 

p obtain sub-microsecond resolution in the time delay. In another example, 

J3 a 60-phase polyphase filter was used to provide the necessary time 

St 15 delays for a suitable presentation of a audio stream sampled at 48 
O kiloHertz. 

While the fractional delay filters 208-212 in the disclosed 
{7 embodiment are each a FIR (polyphase) filter, the principles of the 

fz present invention are equally applicable to the use of other filters or digital 

J3 20 delays which provide the required delay in a digital audio sample. 

The digital interaural delay line 254 in accordance with the 
principles of the present invention can be implemented in any suitable 
processor or computer system. For instance, the digital interaural delay 
line 254 can be implemented at a host level in a personal computer (PC) 
25 based platform using regular instruction sets or MMX™ technology, or can 
be implemented in a digital signal processor (DSP). 

To further improve upon efficiency in accordance with the 
principles of the present invention, the delay may be fixed for one ear, and 
varied for the sound intended for the other ear, according to the desired 
30 movement of the source sound. This alternative method may save as 
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many as half of the instruction cycles required to otherwise process a 
variably delayed sound to both ears. 

The appropriately delayed left and right ear signals can be 
forwarded to a next stage for further processing, or sent directly to 
5 headphones or loudspeakers for presentation to the listener, as a simple 
binaural signal processing method. 

Thus, in accordance with the principles of the present 
invention, a solution to the problem of generating a proper interaural time 
delay in 3D audio and acoustical virtual display applications is 

10 implemented with the requirement for little processing delay. The 
principles of the present invention saves instruction cycles of a processor 
over conventional interpolation techniques, and use of the FIFO buffer 
204 eliminates the need for the storage of a suitable plurality of null taps 
in each of the many otherwise required conventional HRTF filters. The 

15 saved processing power can be used for other purposes, e.g., to enhance 
the HRTF effects. 

Since ITDs are extracted, processed, and implemented 
separately in a roughly resolved delay module (i.e., the integer delay 
module 250), and in a finely tuned delay module (i.e., the fractional delay 

20 module 252), the 3D audio effects can be easily controlled and adjusted 
to suit other special requirements, e.g., to be optimized for different head 
sizes. The super resolution sub-sample filtering polyphase filter based 
delay lines in accordance with the principles of the present invention 
introduce necessary delay without introducing discontinuity or 'clicks' in 

25 the presentation to the listener. 

The principles of the present invention are applicable for use 
in any 3D audio system that uses an interaural time delay as a localization 
queue for perceived direction of the sound by the listener. For instance, 
the present invention relates to 3D sound positioning in gaming, 

30 virtualizing multiple loudspeaker array systems having two physical 
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speakers in AC3/Dolby™ Digital systems, advanced computer user 
interfaces, virtual acoustic reality software for architectural walk-throughs, 
auralization hardware/software, 3D enhancement for general stereo and 
wireless headphone sets, etc. 

While the invention has been described with reference to the 
exemplary embodiments thereof, those skilled in the art will be able to 
make various modifications to the described embodiments of the invention 
without departing from the true spirit and scope of the invention. 



12 



